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Abstract — Spectrum sensing receives much attention recently 
in tlie cognitive radio (CR) network research, i.e., secondary 
users (SUs) constantly monitor channel condition to detect the 
presence of the primary users (PUs). In this paper, we go beyond 
spectrum sensing and introduce the PU separation problem, which 
concerns with the issues of distinguishing and characterizing PUs 
in the context of collaborative spectrum sensing and monitor 
selection. The observations of monitors are modeled as boolean 
OR mixtures of underlying binary sources for PUs. We first 
justify the use of the binary OR mixture model as opposed to 
the traditional linear mixture model through simulation studies. 
Then we devise a novel binary inference algorithm for PU 
separation. Not only PU-SU relationship are revealed, but PUs' 
transmission statistics and activities at each time slot can also 
be inferred. Simulation results show that without any prior 
knowledge regarding PUs' activities, the algorithm achieves high 
inference accuracy even in the presence of noisy measurements. 

I. Introduction 

With tremendous growth in wireless services, the demand 
for radio spectrum has significantly increased. However, spec- 
trum resources are scarce and most of them have been already 
licensed to existing operators. Recent studies have shown that 
despite claims of spectral scarcity, the actual licensed spec- 
trum remains unoccupied for long periods of time [?]. Thus, 
cognitive radio (CR) systems have been proposed HI, Q, [31 
in order to efficiently exploit these spectral holes, in which 
licensed primary users (PUs) are not present. CRs or secondary 
users (SUs) are wireless devices that can intelligently monitor 
and adapt to their envkonment, hence, they are able to share 
the spectrum with the hcensed PUs, operating when the PUs 
are idle. 

One key challenge in CR systems is spectrum sensing, 
i.e., SUs attempt to learn the environment and determine the 
presence and characteristics of PUs. Spectrum sensing can be 
done at SUs individually or cooperatively |(4l, 10, with or 
without the assistance of infrastructure supports such as dedi- 
cated monitor nodes and cognitive pilot channel (CPC) IS, l?), 
E, [9|. Energy detection is one of the most commonly used 
method for spectrum sensing, where the detector computes 
the energy of the received signals and compares it to a certain 
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threshold value to decide whether the PU signal is present or 
not. It has the advantage of short detection time but suffers 
from low accuracy compared to feature-based approaches such 
as cyclostationary detection 121, From the prospective of 
a CR system, it is often insufficient to detect PU activities in a 
single SU's vicinity ("is there any PU near me?"). Rather, it is 
important to determine the identity of PUs ("who is there?") as 
well as the distribution of PUs in the field ("where are they?"). 
We call these issues the PU separation problem. 

To motive the need for PU separation, let us consider the 
following scenarios: 

• Multiple SUs cooperatively infer the activities of PUs, 
some of which may be observable to only a subset of 
SUs. In this case, the SUs need to identify the PU- 
SU adjacency relationships. Blindly assuming all PUs 
are observable to all SUs will lead to inferior detection 
results. 

• Dedicated monitors are employed for spectrum sensing. 
There exists redundancy in monitors' observations due to 
common PUs across multiple monitors. Such redundancy 
can be reduced by judiciously selecting a subset of moni- 
tors to report their spectrum sensing results. Furthermore, 
some monitors can be put to low-power modes for energy 
conservation. 

Clearly, PU separation is a more challenging problem com- 
pared to node-level PU detection. The conventional wisdom 
suggests that sophisticated methods such as feature-based 
detection are necessary. On the contrary, we find that through 
cooperation among monitors or SUs, not only accuracy of 
energy detection can be improved as been demonstrated in 
several existing work |10|, |4|, (5], but also PUs can be 
identified using solely binary information (due to thresholding 
in energy detection). The key to this surprising result is a 
binary inference framework that models the observations of 
SUs and monitors as boolean OR mixtures of underlying 
binary latency sources for PUs. It allows us to exploit the 
correlation structure among distributed binary observations. 
We develop an iterative algorithm, called Binary Independent 
Component Analysis (blCA), to determine the underlying 
latent sources (i.e., PUs) and their active probabilities. In 
bICA, no prior information regarding the PUs' activities or 
observation noise is assumed. Given m monitors or SUs, up 
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to 2™ — 1 PUs can be inferred resulting in great efficiency. 
Evaluation results show effectiveness of bICA under practical 
settings. 

Contributions: In this paper, we make the following contri- 
butions toward the design of a binary inference framework for 
PU separation in cognitive radio networks: 

• We introduce the PU separation problem with cooperative 
SU inference model and discuss its importance on CR 
systems. 

• We provide a stochastic analysis on the difference be- 
tween linear and binary PU energy detection models. 
Results from the study imply that using just binary 
observations from SUs has comparable accuracy with 
using a linear model, while incurring much less overhead. 

• We apply bICA to solve the PU separation problem 
without any assumption on the noise model or prior 
knowledge on the PU activities. We furthermore consider 
the inverse problem of inferring the detailed PUs' activi- 
ties given the SUs' observations and the inferred model. 

The rest of the paper is organized as follows. In Section [H] 
the observation model is introduced. A comparison between 
the linear and binary energy model and brief overview of 



related work are also presented. In Section III we present the 
bICA algorithm to determine the statistics of PU activities and 
the inference algorithm to decide which set of PUs are active. 
Formulation and solution to the inverse problem under noisy 
measurements are presented in Section |IV] Evaluation results 
are detailed in Section [V] followed by an overview of related 
work is provided in Section VI followed by conclusions in 
Section EJ 

II. Model and Preliminary 

Consider a slotted system in which the transmission activ- 
ities of n PUs are modeled as a set of independent binary 
variables y with active probabilities ^'(y). The binary ob- 
servations due to energy detection at the m monitor nodes 
(for the remaining of the paper, we do not distinguish monitor 
nodes and SUs) are modeled as an m-dimension binary vector 
X = [xi,X2, ■ ■ ■ ,Xrn]'^ with joiut distribution 7'(x). It is 
assumed that presence of any active PU surrounding of a 
monitor leads to positive detection. An unknown binary mixing 
matrix G„ixn is used to represent the relationship between 
the observable variables in x and the latent binary variables 
in y = [yi, 2/2, ■ • • , Vn]'^ as follows: 
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(1) 



where A is Boolean AND, V is Boolean OR, and gij is the 
entry on the ith row and the jth column of G. For ease of 
presentation, we introduce a short-hand notation as 



Gi 



(2) 



In essence, gij encodes whether monitor i can detect the 
transmission of PU j. For a monitor i, the energy detection 
returns 1 when the monitor can detect one or more active PUs. 




Fig. 1: A sample network scenario with number of SUs m — 5, 
number of PUs n = 4 and its bipartite graph transformation. White 
circles represent PUs, black circles represent SUs and dashed lines 
illustrate SUs' monitor range. 



G can be seen as the adjacency matrix of an undirected bi- 
partite graph G — {U,V,E), where U — {xi,X2, ■ ■ ■ ,Xm} 
and V = {yi,y2, - ■ ■ ,yn}- An edge e = {xi,yj) exists if 
gij — 1. Illustration of a sample network scenario and its 
bipartite graph is presented in Figure [T] 

Consider an to x T matrix X, which is the collection of T 
realizations of vector x. The goal of bICA is to determine 
the distribution of the latent independent random variables 
y and the binary mixing matrix G from X, such that X 
can be decomposed into the mixing of realizations of y. 
From G and y, we can identify the PUs and additionally 
infer PUs' activities at different time slots. Note that in Q, 
measurement noise is not explicitly modeled, rather, is treated 
as independent sources. 

A. Why Binary Inference? 

In this section, we motivate the use of a binary inference 
framework by considering an alternative linear mixing model. 
In the linear model, at each time slot, the received signal power 
at each monitor can be modeled as a linear combination of 
the transmitted signal power from active PUs. More specifi- 
cally. Let V = [vi,V2,. ■ ■ ,v,nV , z = [zi,Z2,...,ZnY, and 
n = [ni, 712, . . . , nmY' be the random vectors corresponding 
to the received, transmitted signal power and the Gaussian 
noise respectively, and H is the mxn unknown channel gain 
matrix. Both large-scale path loss with propagation loss factor 
a and small-scale fading following the Rayleigh distribution 
are considered in this model. The received signal power is a 
linear mixture of the transmitted signal power and the noise: 

V = Hz + n. (3) 

Hi is a random variable with mean /i^. If v can be ob- 
served directly, classical linear independent component analy- 



3 



0.12| . 

-©-False alarm 
n 1 _ - X - Miss detection 



^ ° o o o-e-Q -o o €> ©-©-o o- «- o-o 

!5 \ 
^ 0.06- X 
o ^ 

0.04 \ ^ V -^^^ X 

w X \ / \ / 

X » X / 

0.02 1 / \ V 

qI ^ ^ 

5 10 15 20 

Number of PUs 

Fig. 2: False alarm and miss detection probability. Threshold r is set 
at 5dB. 



sis (ICA) ifm can be applied to determine H and z. However, 
this method suffers from three problems. First, z/s have to 
be non-Gaussian to be recoverable. Second, the channel gain 
matrix H tends to vary over time. Lastly, communicating 
the realizations of z to a centralized server for inference 
incurs higher overhead compared to its binary counter part 
X. Furthermore, when only the binary quantized values of v 
are observable (i.e., x), ICA is not longer suitable due to the 
non-linearity of the quantization function. 

Next, we compare accuracy of the binary OR mixture model 
against a quantized version of the above linear model, namely, 
x' = U{v — r), where U{-) is a step function defined as 
U{r) = 1 if r > 0, U{t) — otherwise, and t is a pre-set 
threshold. We are interested to see the degree of "information 
loss" due to the OR approximation. In the simulation, 10 
monitors and n PUs are deployed in a 500x500 square meter 
area with n varying from 5 to 20. Locations of PUs are 
chosen arbitrarily with the restriction that no two PUs can 
be observed by the same set of monitors. The PUs' transmit 
power levels are fixed at 20mW, the noise floor is -95dbm, and 
the propagation loss factor is 3. The SNR detection threshold 
for the monitors is set to be 5dB (above the noise floor). The 
value is chosen so that the false alarm probability (PU's are 
reported while none exists) is less than 10%. Elements of the 
binary mixing matrix G are either 1 or 0, depending on the 
received signal for one respective PU only. In other words, 
gij = 1 if the ith SU can detect transmissions from the jth 
PU. The PUs' activities are modeled as a two-stage Markov 
chain with transition probabilities uniformly distributed over 
(0,1). We run the simulation for T = 5,000 slots and obtain 
the observations X and Xi for the binary OR and linear 
mixing model, respectively. Figure |2] shows the false alarm 
and miss detection probability in the binary OR model using 
the results from the linear mixing model as the ground truth. 
From Figure |2] we see the two models have very close 
performance. We also experiment the case in which the initial 
phases of PU's transmitted signal vary in [0, 27r], and have 
similar observations. 

To this end, we conclude that the binary model is a good 
approximation of the quantized linear model. As will be 



demonstrated later, efficient algorithms can be devised for the 
PU separation problem under the binary model. 

III. Binary Inference Framework 

In this section, we first discuss the identifiability of binary 
independent sources for the PUs from OR mixtures observed 
at the monitors, and then present an inference algorithm 
that determines the unknown mixing matrix and underlying 
sources. 

A. Identifiability 

For an m-dimension binary random vector x, the number 
of different realizations is 2™. From the data matrix X, 
distribution of x can be estimated in a non-biased manner as 
the number of observations goes to infinity. We can initialize 
n = 2™ — 1 and G matrix of dimension m x (2™ — 1) 
with rows being all possible binary combinations of length n 
(with exception of the all-0 entry). This results in a complete 
bipartite graph, in which an edge exists between any two 
vertices in U (the set of monitors) and V (the set of signal 
sources), respectively. For a random variable yj e V, its 
neighbors in U is given by the non-zero entries in gj (i.e., the 
jth column of matrix G). Thus, at most 2™ — 1 independent 
components can be identified. 

Define pi =V{yi = 1). Let the set 

= |y I V (.9*j Ay,) = a;„Vz| . 

Therefore, 

V{^) = 7'(yel^(x)) = EyeY(x)^(y) 

= Eyer(x)nrrVr(i-^'0^-^ 

where Viy) is the joint probability of y. The last equality 
holds due to the independence of j/^'s. 

Given the distribution of random vectors x e {0, 1}™, 
2™ — 1 independent equations can be obtained from (|4]) due 
to the OR mixture model in ([TJ. Since there are 2™ — 1 
unknowns (i.e., pi,l = 1,...,2™ — 1), their values can be 
explicitly determined. Clearly, ambiguity exists if two or more 
independent sources have the same set of neighbors in U 
(or equivalently, identical columns in G). In this case, binary 
information is insufficient to distinguish these sources. 

The set of equations in (|4]) are polynomials of sum product 
forms, which are difficult to solve. This necessitates the design 
of specialized algorithms. In the rest of the paper, abusing the 
notation a bit, we denote G the m x (2™ — 1) adjacency 
matrix for the bipartite graph G' — {U,V' , E'), where U = 
{xi,X2, ■ ■ ■,x„i} and V = {yi,y2, ■ ■ • ,y2'"-i} (i-e. the set of 
all possible uniquely identifiable latent sources). Furthermore, 
we arrange G in the order such that gki = 1 if (Z <C fc) = 1, 
for k — 0,...,™ — 1, where ^ is the bit shift to the left. If 
the resulting pi ~ for some I, this implies the corresponding 
column gi can be removed from G. For an example, consider 
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the network scenario with m = 3, the initialized G matrix 
will be: 



G = 



10 10 10 1 
110 11 
1 1 1 1 



with rows corresponding to 3 monitors and columns corre- 
sponding to 7 identifiable PUs. 

B. Inference algorithm 

Before proceeding to the details of the proposed algorithm, 
we first present a few related technical lemmas as follows. 

Lemma 1: Consider a set x ~ [xi, a;2, . . . , x^-i, x/j]^ 
generated by the data model in ([TJ, i.e., 3 binary independent 
sources y, s.t., x = G (g) y. The conditional random vector 
'^xh=o ~ [xitX2, ■ ■ ■ TXh-i\xh = 0]^ corresponds to the 
vector of the first ft, — 1 elements of x when Xh ~ 0. Then, 
'^xh=o — G' ® y', where G' = G. i 2'^-i (i.e. the first 



2^^^ columns of G) and V{y[ = 1) 



V{yi = 1) for 



? = l,...,2''-i. 

Proof: We first derive the conditional probability distri- 
bution of the first h — 1 observation variables given Xh = 0, 

V(xi,X2, ■ ■ .,Xh-l \xh^O) 
= 'P{xi,X2,-..,Xh-i\Xh=0)V{Xh=0) 
2''-! 

yeY(x) '=1 

= E n pn^-pi)'-'' n (^~p^) 

yi = 0,y 9hi = 1 

(5) 

since V{xh = 0) = ]^ (1 ~ pi), we have 

'P{xi,X2, ■ . ■,Xh-l I Xh = 0) 

E n ^'^(l-p')'"''■ 
yl,...>-l e F(xi,...,;,_i) 
yi = 0, Vghi = 1 

(6) 

Clearly, by setting Viy'i = 1) = V{yi ^ I) foi I ^ 
1,...,2''^^, the above equality holds. In the other word, 
the conditional random vector x^-^^q = G' (S) y' for G' = 

G':4,...,2''-l- ■ 

The above lemma establishes that the conditional random 
vector Xa;^=o can be represented as an OR mixing of 2^^^ 
independent components. Furthermore, the set of the indepen- 
dent components is the same as the first 2''^^ independent 
components of x (under proper ordering). 

Consider a sub-matrix of data matrix X, X'^h-i)xT^ where 
the rows correspond to observations of xi,X2, - ■■ ,a;/i-i for 
t — 1,2, . . . ,T such that Xht = 0. Define X(^h-i)xT7 which 
consists of the first /i — 1 rows of X. Suppose that we 
have computed the bICA for data matrices X'^h-i)xT ^i^^ 



is real- 



^{h-i)xT- From Lemma Tj we know that X'^^^i-^^rp 
ization of OR mixtures ofindependent components, denoted 
by yO,_i. Furthermore, V[y°^^^{l) = 1] - Viyi = 1) for 
I = 1,...,2'^^^. Clearly, Xqi-i)xt is realization of OR 
mixtures of 2''^^ independent components, denoted by y2ft-i. 
Additionally, it is easy to see that the following holds: 



ny2ih-i)ii) = M 

= 1 - [1 - 7'(yO._i (0 = 1)] [1 - r{yi+2^- 



= 1)] 



where 1 = 1, 

Pi 

Pl+2>- 



.,2 



Therefore, 



= 7'(y^,_i(0 = l), 



= 1 - 



_ j^(xh=iAxi=o,yie[i...h-i]) 

P2'-i+i - n,=i.....,^..-i_i(i-w) ■ 



1 = 1,.. 

1 = 2,.. 



(7) 

The last equation above holds because realizations of x where 
(xk = 1 while Xi = 0; Vi G {0, . . . , k — 1}) are generated 
from OR mixtures of y2fc-i's only. Define J^{A) as the 
frequency of event A. To this end, we have the following 
iterative algorithm as illustrated in Algorithm [T] 

When the number of monitors m = 1, there are only 
two possible unique sources, one that can be detected by the 
monitor, denoted by [1]; and one that cannot, denoted by [0]. 
Their active probabilities can easily be calculated by counting 
the frequency of (a:i = 1) and {xi = 0) (lines [T| - [3|l. If 
m > 2, we apply Lemma [T] and (|7| to estimate p and G 
through a recursive process. We invoke FindBICA on two 



sub-matrices X 





(m-l)xT 



and X(m_i)xT computed from X 



to determine Pi...2'"-i and p'j^ 2">-i' ^^^^ infer j52™-i+i...2'" 
as in (j7]i (lines|6]-[8]l. Finally, p^ and its corresponding column 
gii in G are pruned in the final result if ph < s (lines |9]-[TT]i. 

Algorithm 1: Incremental binary ICA inference algorithm 

FindBICA (X) 

input : Data matrix X^xt 

init : n = 2™ - 1; 

Ph = 0,h = 1, . . . ,n; 

G = m X (2™ — 1) matrix with rows corresponding all possible 
binary vectors of length m; 

e = the minimum threshold for ph to be considered a real 
component; 

1 if m = 1 then 



Pi 

P2 



else 



0) ; 

1) ; 



ft. ..2—1 = FindBICA (X^„_i)xj,); 
P'l...2"-i = FindBICA (X(„_i)xt); 



1,-1 
for ; = 2 



ft" 



9 for h 



ft+2™-l = 1 - 

= l,...,2'" do 



1-Pl ' 
= lA3;i=0.Vig[l...m-l]) 
2'"-l,i^2">-l+l (l-Pi)' 



10 
11 



if {ph < e) V {ph = 0) then 
1^ prune ph and corresponding columns gh'. 



12 output: p and G 
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Computation complexity: Let S{m) be the computation time 
for finding bICA given XmxT- From Algorithm [T] we have, 

S'(m) = 2S'(m- 1) +c2", (8) 

where c is a constant. It is easy to verify S{m) — cm2"^. 
Therefore, Algorithm [T] has an exponential computation com- 
plexity with respect to m. This is clearly undesirable for large 
m's. However, we notice that in practice, correlations among 
Xi's exhibit locality, and matrix G tends to be sparse. Instead 
of using a complete bipartite graph to represent G, the degree 
of vertices in V (or the number of non-zero elements in gk) 
tends to be much less than m. More specifically, for every pair 
of monitors i and k, we compute the covariance between their 
observations: 



!{i,k) = 



(9) 



T T T 

If cov{i, k) < e, where e is a small value (e.g., the upper 
confidence bound of cov{i, k) estimate), we can remove the 
corresponding columns in G and elements in y. 

IV. The Inverse Problem 

Now we have the mixing matrix Gmxn and the active 
probabilities 7^(y), given observation XmxT, the inverse 
problem concerns inferring the realizations of the latent vari- 
ables YnxT- Extracting multiple PUs' activities from the OR 
mixture observations is a challenging but important problem in 
cognitive radio networks. Interesting information, such as the 
PU channel usage pattern can be inferred once Y is available. 
The SUs will then be able to adopt better spectrum sensing and 
access strategies to exploit the spectrum holes more effectively. 

Recall that n is the number of PUs (latent variables). Denote 
Ui a binary variable for the ith latent variable. Let x = G (g) y. 
We assume that the probability of observing X given x 
depends on their Hamming distance d{x, X) — ~ 

and r{x\X) = pi^^'^\l - where pe is 

the error probability of the binary symmetric channel. To 
determine y, we can maximize the posterior probability of 
y given X derived as follows, 

_ v{X\y}v{y} 
- nX} 
_ v{X\y}v{y} 

V{X} 
W v{X^x\y}v{y} 

nX} 
W v{X\x}v{y} 
v{X} 

nT=inx,\x,}u]^,v{v.} 

viX} 



where x = G (E) y. (a) and (&) are due to the deter- 
ministic relationship between x and y. Recall that Xi = 
VjLi {gij ^yj),'i — 1, . . . , m. With M is a "large enough" 
constant, we can use big-M formulation llT2l to relax the 
disjunctive set and convert the above relationship between x 
and y into the following two sets of conditions: 

< YTi=i9i]y]i = 1, 



, , m. 



Here, since dijVj — '^^ ^'^^ -^'^ = Finally, 

taking log on both sides and introducing additional auxiliary 
variable — \Xi — Xi\, we have the the following integer 
programming problem: 

m 

max . [ui \ogPe + (1 - cti) log(l - p^)] 

a,y ^ 

+ E"=i [(1 - Vj) log (1 - Pj) + Vj logPj] 

n 

s.t. Xi<'^gijyj, Vi = l,...,m, 

n 

n- Xi>^gijyj, Vi = l,...,m, 

ai > Xi — Xi, Vi = 1, . . . , m, 

C(i > Xi — Xi, Vi = 1. . . . , m, 

ai,Xi,yj = {0, 1} , Vi = 1, . . . ,m, j = 1, . . . ,n. 

(11) 

This optimization function can be solved using ILP solvers. 
Note that p,, can be thought of the penalty for mismatches 
between Xi and Xi. 

Zero Error Case: If X is perfectly observed, containing no 
noise, we have Pe = and ai = X; — JC^ = 0, or equivalently, 
Xi = Xi. The integer programming problem in ( [TT| ) can now 
be simpUfied as: 

n 

max . ^ [(1 - J/j) log (1 - Pj) + yj logp,] 



3.t. Xi<^gijyj,\Ji = l,...,m, 



(12) 



n- Xi> ^gijyj,yi = 1, . . . ,m, 

yj = {0,1}, = 

Clearly, the computation complexity of the zero error case is 
lower compared to ( [TT| . It can also be used in the case where 
prior knowledge regarding the noise level is not available. 

V. Evaluation 

In this section, we first introduce the performance metrics, 
and then present evaluation results on a synthetic data set 
varying the number of PUs. 

A. Performance metrics 

We denote by p and G the inferred active probability of 
PUs and the inferred mixing matrix, respectively. 

1) Structure Error Ratio: This metric indicates how ac- 
curate the mixing matrix is estimated. It is defined by the 
Hamming distance between G and G divided by its size. 



(13) 



M -x^ > 5»j2/j , Vz = 1, . . . , m. 



(10) 



To estimate Hg however, two challenges remain: First, the 
number of inferred independent components may not be 
identical as the ground truth. Second, the order of independent 
components in G and G may be different. 
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Algorithm 2: Bipartite graph matching algorithm 

MatchPG (G, G, p) 

input : Gmxn, Gmxn, Pixn', (n < fl < 2™) 
init : Gmxn ~ 0; Pixn ~ 0; Caxa — 0; 

1 for i = 1, . . . , n do 

2 for j = 1 , . . . , n do 

3 if Pi = then 

4 I = d"{g,,gj) X m; 
else 

L Cij = d"{g^,gj)\ 

6 A = BipartiteMatching (C); 

7 for i = 1, . . . , n do 

8 find j such that a^j = 1; 

11 Prune g': g' = g[,.,n', 

12 Prune p': p' = j3'i „; 

13 output: G and p 



To solve the first problem, we can either prune G or 
introduce columns into G to equalize the number of com- 
ponents in = n, where fi is the number of columns in G). 
For the second problem, we propose a matching algorithm 
that minimizes the Hamming distance between G and G by 
permuting the corresponding columns in G. 

Structure Matching Problem: A naive matching algorithm 
needs to consider all n\ column permutations of G, and 
chooses the one that has the minimal Hamming distance to G. 
This approach incurs an exponential computation complexity. 
Next, we first formulate the best match as an ILP problem. 
Denote the Hamming distance between column gi and gj as 
Cij > 0. Define a permutation matrix Anxn with = 1 
indicating that the zth column in G is matched with the jth 
column in G. The problem now is to find a permutation 
matrix such that the total Hamming distance between G and 
G (denoted by d^{G,G)) is minimized. We can formulate 
this problem as an ILP as follows: 

n n 



1=1 j=i 

n 

5.t. ^flij = 1, 



(14) 



a^j = 0, 1 V«,j = 1, . . . ,n. 

The constraints ensure the resulting A is a permutation ma- 
trix. This problem can be solved using ILP solvers. However, 
we observe that the ILP is equivalent to a maximum-weight 
bipartite matching problem. In the bipartite graph, the vertices 
are positions of the columns, and the edge weights are the 
Hamming distance of the respective columns. If we consider 
d^{gi,gi), the Hamming distance between column gi and g^, 
to be the "cost" of matching gi to gi, then the maximum- 
weight bipartite matching problem can be solved in 0{n^) 




Fig. 3: From top to bottom: inferred matrix G with 18 inferred 
components, transformed matrix G with only 10 components re- 
maining (8 noisy ones were removed), original G and difference 
matrix between G and G . Black dot = 1 and white dot = 0. 



running time |fT3l|, where n is the number of vertices. The 
algorithm requires G and G to have the same number of 
columns. 

One greedy solution is to prune G by selecting the top 
n components from G, which have the highest associated 
probabilities pi since they are the most likely true components. 
However, when T is small and/or under large noise, we 
may not have sufficient observations to correctly identify 
components in G with high confidence. As a result, true 
components might have lower active probabilities comparing 
to the noise components. To address the problem, we instead 
keep a larger n and introduce h — n artificial components into 
G. These components will be represented by zero columns 
in G. While matching the inferred columns in G to the 
columns in G, clearly an undesirable scenario occurs when we 
accidently match a column in G to an additive zero column 
in G. This happens when an inferred column gi is sparse (i.e. 
having a very small Hamming distance to the zero column). 
To avoid the incident, we multiply the cost of matching any 
column in G to a zero column in G by m. This eliminates the 
case in which a column gi is matched with a zero column in 
G, since it is more expensive than matching with another non- 
zero column gi. We can now select the best n candidates in G, 
which yields a reduced mixing matrix G of size my. n, and 
elements in active probability vector p' will also be selected 
accordingly. The solution to the structure matching problem 
is detailed in Algorithm |2l 

In the algorithm, lines^ - 15] build the input weight matrix 
Cnxn for the bipartite matching algorithm. If gi is a zero 
column, Cij will be scaled by m to avoid the matching between 
column gi and gj (line [4|. The bipartite matching algorithm 
finds the optimal permutation matrix A to transform G into G 
that is "closest" to G (lines [6l-[T0|. We are only interested in 
the first n columns of G and p' (as they most likely represent 
the true PUs). Therefore, G and p' are pruned in lines [lT|- 

m 
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As an example, the inferred result of a random network 
with n = m = 10 is given in Figure [3] Non-zero entries and 
zero entries of G, G, and G are shown as black and white 
dots, respectively. The entry-wise difference matrix \G~G \ is 
given in the bottom graph. Gray dots in the difference matrix 
indicate identical entries in the inferred G and the original G; 
and black dots indicate different entries (and thus errors in the 
inferred matrix). In this case, only the first row (corresponding 
to the first monitor xi) contains some errors. 

2) PU Active Probability Error Ratio: Prediction error in 
the inferred active probabilities of PUs is measured by the root 
mean square error ratio between p' and p, defined as follows: 



V n I n ' 



(15) 



PU active probability error ratio can be interpreted as the 
fraction of the inferred active probability that deviates from 
the true values. 

3) Miscount in the number of PUs: Accuracy of the 
inference algorithm can also be assessed by evaluating the 
difference between the number of inferred PUs n and the 
real number of PUs n in the system. Clearly, with a smaller 
threshold value e in Algorithm[T| the number of inferred PUs h 
may increase. In the subsequent experiments, we fix e = 0.01 
and evaluate the changes in fi as the real number of PUs 
increases from 5 to 20. 

4) PU Activity Error Ratio: After applying FindBICA in 
Algorithm [T| on the measurement data of length T to obtain 
G and p, realizations of the hidden causes (i.e. PUs) can 
be computed by solving the maximum likelihood estimation 
problem in (111. We define 

A 



Ho, — 



1 

riT 



(16) 



where j/j is the i'th column of Y . Similar to Hg, this metric 
measures how accurately the PUs' activity matrix is inferred 
by calculating the ratio between the size of y and the absolute 
difference between y and y- 

B. Experimental results 

For evaluation, we consider the same simulation scenario 



as in Section II- A where 10 monitors are randomly deployed 
to monitor n PUs using a single channel. Algorithms are 
implemented in Matlab, and all experiments are conducted 
on a workstation with an Intel Core 2 Duo T5750@2.00GHz 
processor and 2GB RAM. Noise is introduced by randomly 
flip a bit in the observation matrix X from 1 to (and vice 
versa) at probability e. e is set at 0%, 2%, and 5% in our 
simulations. All presented results are averages of 50 runs 
with different initializations. In the experiments, we vary the 
number of PUs n and the number of observations T. 

1) Varying the number of PUs: In the first set of exper- 
iments, we fix the sample size T = 10, 000 and vary the 
number of PUs from 5 to 20 to study its impact on the accuracy 
of our method. Experiment results over 50 runs for each PU 
setting are shown in Figure]?] In absence of noise, we observe 
that the inferred mixing matrix G is mostly correct even for 
a large number of PUs (Figure |4]^a)). As the number of 



PUs increases, errors in the inferred active probabilities and 
the inferred number of PUs tend to increase though within 
10% and 0.65 as shown in Figure |4|b) and (c). Recall that 
the PUs are ordered based on their respective columns in 
G. Therefore, errors in G may have a cascading effect on 
PU active probability error ratio since we may compare the 
wrong pair of PUs as a result. Performance of the algorithm 
tends to degrade with more noises. In particular, as shown in 
Figure |4|c), more components are inferred compared to the 
ground truth. This is because when the noise probability p^ is 
greater than the threshold value, some noise components are 
erroneously introduced. 

The errors in determining the set of active PUs are shown 
in Figure |4|d). We can see again that the proposed algorithm 
achieves remarkable accuracy at zero-noise level. Prediction 
error on Y is only 1% for 5 PUs, and gradually increases up 
to 5% for 20 PUs. At 2% and 5% noise levels, performance 
degrades as the prediction error goes up to about 10%. Noise 
has two effects on the solution to the inverse problem. First, the 
inferred mixing matrix and active probability can be erroneous. 
Second, no maximum likelihood estimator guarantees to give 
the exact result when the problem is under or close to being 
under-determined with noisy measurements. In fact, we have 
experimented with the case in which G and p are both known, 
the results are similar. This implies that the main source 
of errors in solving the inverse problem comes from the 
problem itself being under-determined or close to being under- 
determined. 

Finally, as shown in Figure ^s), the computation time of 
proposed algorithms is negligible, mostly under 0.2 second 
without noise. With noise, computation time increases but 
is similar at the two noise levels (differing by less than 0.5 
second). The presence of noise may introduce noise compo- 
nents and render the estimation of correlation (in Equation (|9]l) 
inaccurate. Thus, higher processing time entails. 

2) Varying the size of observations T: In the second set of 
experiments, we fix the number of PUs n = 10 and study the 
impact of the observation size T. A small T (and thus insuf- 
ficient observations) would lead to higher uncertainty while 
a large T incurs higher computation overhead. Furthermore, 
if T is too small, some PUs may never be active in the 
trace, making them impossible to be inferred. It is therefore 
interesting to investigate the effect of T on the accuracy and 
computation overhead of the proposed algorithm. 

Experiment results over 50 runs for each observation size 
are shown in Figure |5] As expected, the structure error ratio 
and the PU active probability error ratio reduce significantly 
as T increases from 50 to 1,000 (Figure |5ja) and (b)). If we 
further increase T from 1,000 to 10,000, the performance gain 
is somewhat marginal. However, the computation time grows 
considerably since it takes longer to process the observations 
(Figure |5je)). From Figure |5jc) and (d), we also see that 
the miscount of PUs and the PU activity error ratio are not 
sensitive to the sample size T, but are more affected by the 
noise level. 
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Fig. 4: Effects of noise level and the number of PUs on inference results. Noise is introduced at three levels: 0%, 2% and 5%. Results are 
averages of 50 runs with different initial seed. Symmetric error bars indicate standard deviations. 
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Fig. 5: Effects of noise level and the size of observations on inference results. The a;-axis is in logarithmic scale. 



VI. Related work 

Independent component analysis (ICA) has been studied in 
the past as a computational method for separating a multivari- 
ate signal into additive subcomponents supposing the mutual 
statistical independence of the non-Gaussian source signals. 
Most ICA methods assume linear mixing of continuous sig- 
nals iirn . A special variant of ICA, called Boolean Indepen- 
dent Component Analysis (BICA), considers boolean mixing 
(e.g., OR, XOR etc.) of binary signals. Existing solutions to 
BICA mainly differ in their assumptions of prior distribution of 
the mixing matrix, noise model, and/or hidden causes. In 1 14|, 
Yeredor considers BICA in XOR mixtures and investigates 
the identifiability problem. A deflation algorithm is proposed 
for source separation based on entropy minimization. In [14| 
the number of independent random sources K is assumed 
to be known. Furthermore, the mixing matrix is an K-hy- 



K invertible matrix. In [15], infinite number of hidden causes 
following the same Bernoulli distribution are assumed. Re- 
versible jump Markov chain Monte Carlo and Gibbs sampler 
techniques are applied. In contrast, in our model, the hidden 
causes may follow different distribution and the mixing matrix 
tends to be sparse. Streich et al. |16| study the problem 
of multi-assignment clustering for boolean data, where the 
observations either from a signal following OR mixtures or 
from a noise component. The key assumption made in this 
work is that the elements of matrix X are conditionally 
independent given the model parameters. This greatly reduces 
the computational complexity and makes the scheme amenable 
to gradient descent optimization solution. This assumption is 
in general invalid. In 1 17 1, the problem of factorization and de- 
noise of binary data due to independent continuous sources 
is considered, which follow beta distribution. Finally, Iil5]| 
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consider under-presented case of less sensors than sources 
with continuous noise, while fTT^ and (TE\ deal with over- 
determined case, where the number of sensors is much larger 
In this work, we consider primarily the under-presented cases 
encountered in data networks. 

There exists a large body of work on blind deconvolution 
with binary sources in the context of wireless communica- 
tion ifTSl . IIT9I . In time-invariant linear channels, the output 
signal x{k) is a convolution of the channel realizations a{k) 
and the input signal s{k), k = 1,2, . . . , K as follows: 

L 

x(k) ^^a{l)s{k-l),k = l,...,K. (17) 

(=0 

The objective is to recover the input signal s. Both stochastic 
and deterministic approaches have been devised for blind 
deconvolution. As evident from ^Tf) , the output signals are 
linear mixtures of the input sources in time, and additionally 
the mixture model follows a specific structure. 

Literature on boolean/binary factor analysis (BFA) is also 
related to our work. The goal of BFA is to decompose a 
binary matrix X„ixT into A„ixn ^ BnxT with (g) being the 
OR mixture relationship as defined in ([T]i. We use the same 
notation of m, n, and T to illustrate the relationship between 
BFA and bICA. X in BFA is often called an attribute-object 
matrix providing m-dimension attributes of T objects. A and 
B are the attribute-factor and factor-object matrices. All the 
elements in X, A, and B are either or 1. n is defined 
to be the number of underlying factors and is assumed to 
be considerably smaller than the number of objects T. BFA 
methods aim to find a feasible decomposition minimizing n. 
Frolov et al. study the problem of factoring a binary matrix 
in a series of papers 1201 . ETI . Il22ll using Hopfield neural 
networks. This approaches are based on heuristics and do 
not provide much theoretical insight regarding the properties 
of the resulting decomposition. More recently, Belohlavek et 
al. propose a matrix decomposition method utilizing formal 
concept analysis [23 1. The paper claims that optimal decom- 
position with the minimum number of factors are those where 
factors are formal concepts. It is important to note that even 
though BFA assumes a similar disjunctive mixture model to 
our problem, the objective is different. While BFA tries to 
find a matrix factorization so that the number of factors are 
minimized, bICA tries to identify independent components. 
One can easily come up an example, where the number of 
independent components (factors) is larger than the number 
of attributes, while BFA always finds factors no larger than 
the number of attributes. 

VII. Conclusions 

In this paper, we introduced the PU separation problem 
for cognitive radio networks and argue its relevance in col- 
laborative spectrum sensing and monitor resource allocation. 
We demonstrated that a binary mixing model is sufficient to 
characterize the behavior of energy detectors in presence of 
multiple PUs, and devised a binary inference framework to 
resolve the PU separation problem. The results are somewhat 
surprising that PUs can be accurately separated and identified 



with only binary observations from the set of monitors to 
which they are observable. Simulation validation shows that 
the PU-SU relationship as well as the PUs' statistics and 
activities can be estimated with high accuracy when the noise 
is marginal. 
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